Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 24(1): 159, 2023 Apr 20.
Article in English | MEDLINE | ID: mdl-37081398

ABSTRACT

BACKGROUND: Biomedical researchers are strongly encouraged to make their research outputs more Findable, Accessible, Interoperable, and Reusable (FAIR). While many biomedical research outputs are more readily accessible through open data efforts, finding relevant outputs remains a significant challenge. Schema.org is a metadata vocabulary standardization project that enables web content creators to make their content more FAIR. Leveraging Schema.org could benefit biomedical research resource providers, but it can be challenging to apply Schema.org standards to biomedical research outputs. We created an online browser-based tool that empowers researchers and repository developers to utilize Schema.org or other biomedical schema projects. RESULTS: Our browser-based tool includes features which can help address many of the barriers towards Schema.org-compliance such as: The ability to easily browse for relevant Schema.org classes, the ability to extend and customize a class to be more suitable for biomedical research outputs, the ability to create data validation to ensure adherence of a research output to a customized class, and the ability to register a custom class to our schema registry enabling others to search and re-use it. We demonstrate the use of our tool with the creation of the Outbreak.info schema-a large multi-class schema for harmonizing various COVID-19 related resources. CONCLUSIONS: We have created a browser-based tool to empower biomedical research resource providers to leverage Schema.org classes to make their research outputs more FAIR.


Subject(s)
Biomedical Research , COVID-19 , Humans , Metadata
2.
Sci Data ; 10(1): 99, 2023 02 23.
Article in English | MEDLINE | ID: mdl-36823157

ABSTRACT

Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.


Subject(s)
Communicable Diseases , Datasets as Topic , Metadata , Reproducibility of Results , Datasets as Topic/standards , Humans
4.
Nat Methods ; 20(4): 536-540, 2023 04.
Article in English | MEDLINE | ID: mdl-36823331

ABSTRACT

Outbreak.info Research Library is a standardized, searchable interface of coronavirus disease 2019 (COVID-19) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) publications, clinical trials, datasets, protocols and other resources, built with a reusable framework. We developed a rigorous schema to enforce consistency across different sources and resource types and linked related resources. Researchers can quickly search the latest research across data repositories, regardless of resource type or repository location, via a search interface, public application programming interface (API) and R package.


Subject(s)
COVID-19 , Humans , SARS-CoV-2 , Disease Outbreaks
5.
Nat Methods ; 20(4): 512-522, 2023 04.
Article in English | MEDLINE | ID: mdl-36823332

ABSTRACT

In response to the emergence of SARS-CoV-2 variants of concern, the global scientific community, through unprecedented effort, has sequenced and shared over 11 million genomes through GISAID, as of May 2022. This extraordinarily high sampling rate provides a unique opportunity to track the evolution of the virus in near real-time. Here, we present outbreak.info , a platform that currently tracks over 40 million combinations of Pango lineages and individual mutations, across over 7,000 locations, to provide insights for researchers, public health officials and the general public. We describe the interpretable visualizations available in our web application, the pipelines that enable the scalable ingestion of heterogeneous sources of SARS-CoV-2 variant data and the server infrastructure that enables widespread data dissemination via a high-performance API that can be accessed using an R package. We show how outbreak.info can be used for genomic surveillance and as a hypothesis-generation tool to understand the ongoing pandemic at varying geographic and temporal scales.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Genomics , Disease Outbreaks , Mutation
6.
Nat Commun ; 13(1): 5596, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36167835

ABSTRACT

Lassa fever is a severe viral hemorrhagic fever caused by a zoonotic virus that repeatedly spills over to humans from its rodent reservoirs. It is currently not known how climate and land use changes could affect the endemic area of this virus, currently limited to parts of West Africa. By exploring the environmental data associated with virus occurrence using ecological niche modelling, we show how temperature, precipitation and the presence of pastures determine ecological suitability for virus circulation. Based on projections of climate, land use, and population changes, we find that regions in Central and East Africa will likely become suitable for Lassa virus over the next decades and estimate that the total population living in ecological conditions that are suitable for Lassa virus circulation may drastically increase by 2070. By analysing geotagged viral genomes using spatially-explicit phylogeography and simulating virus dispersal, we find that in the event of Lassa virus being introduced into a new suitable region, its spread might remain spatially limited over the first decades.


Subject(s)
Lassa Fever , Lassa virus , Animals , Humans , Lassa Fever/epidemiology , Lassa virus/genetics , Phylogeography , Risk Factors , Rodentia
7.
Nat Commun ; 13(1): 4784, 2022 08 15.
Article in English | MEDLINE | ID: mdl-35970983

ABSTRACT

Regional connectivity and land travel have been identified as important drivers of SARS-CoV-2 transmission. However, the generalizability of this finding is understudied outside of well-sampled, highly connected regions. In this study, we investigated the relative contributions of regional and intercontinental connectivity to the source-sink dynamics of SARS-CoV-2 for Jordan and the Middle East. By integrating genomic, epidemiological and travel data we show that the source of introductions into Jordan was dynamic across 2020, shifting from intercontinental seeding in the early pandemic to more regional seeding for the travel restrictions period. We show that land travel, particularly freight transport, drove introduction risk during the travel restrictions period. High regional connectivity and land travel also drove Jordan's export risk. Our findings emphasize regional connectedness and land travel as drivers of transmission in the Middle East.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Humans , Middle East/epidemiology , Pandemics/prevention & control , Travel
8.
Res Sq ; 2022 Jun 28.
Article in English | MEDLINE | ID: mdl-35794893

ABSTRACT

The emergence of SARS-CoV-2 variants of concern has prompted the need for near real-time genomic surveillance to inform public health interventions. In response to this need, the global scientific community, through unprecedented effort, has sequenced and shared over 11 million genomes through GISAID, as of May 2022. This extraordinarily high sampling rate provides a unique opportunity to track the evolution of the virus in near real-time. Here, we present outbreak.info, a platform that currently tracks over 40 million combinations of PANGO lineages and individual mutations, across over 7,000 locations, to provide insights for researchers, public health officials, and the general public. We describe the interpretable and opinionated visualizations in the variant and location focussed reports available in our web application, the pipelines that enable the scalable ingestion of heterogeneous sources of SARS-CoV-2 variant data, and the server infrastructure that enables widespread data dissemination via a high performance API that can be accessed using an R package. We present a case study that illustrates how outbreak.info can be used for genomic surveillance and as a hypothesis generation tool to understand the ongoing pandemic at varying geographic and temporal scales. With an emphasis on scalability, interactivity, interpretability, and reusability, outbreak.info provides a template to enable genomic surveillance at a global and localized scale.

9.
bioRxiv ; 2022 Jun 02.
Article in English | MEDLINE | ID: mdl-35677074

ABSTRACT

Background: Biomedical researchers are strongly encouraged to make their research outputs more Findable, Accessible, Interoperable, and Reusable (FAIR). While many biomedical research outputs are more readily accessible through open data efforts, finding relevant outputs remains a significant challenge. Schema.org is a metadata vocabulary standardization project that enables web content creators to make their content more FAIR. Leveraging schema.org could benefit biomedical research resource providers, but it can be challenging to apply schema.org standards to biomedical research outputs. We created an online browser-based tool that empowers researchers and repository developers to utilize schema.org or other biomedical schema projects. Results: Our browser-based tool includes features which can help address many of the barriers towards schema.org -compliance such as: The ability to easily browse for relevant schema.org classes, the ability to extend and customize a class to be more suitable for biomedical research outputs, the ability to create data validation to ensure adherence of a research output to a customized class, and the ability to register a custom class to our schema registry enabling others to search and re-use it. We demonstrate the use of our tool with the creation of the Outbreak.info schemaâ€"a large multi-class schema for harmonizing various COVID-19 related resources. Conclusions: We have created a browser-based tool to empower biomedical research resource providers to leverage schema.org classes to make their research outputs more FAIR.

10.
bioRxiv ; 2022 Dec 07.
Article in English | MEDLINE | ID: mdl-35132411

ABSTRACT

To combat the ongoing COVID-19 pandemic, scientists have been conducting research at breakneck speeds, producing over 52,000 peer-reviewed articles within the first year. To address the challenge in tracking the vast amount of new research located in separate repositories, we developed outbreak.info Research Library, a standardized, searchable interface of COVID-19 and SARS-CoV-2 resources. Unifying metadata from sixteen repositories, we assembled a collection of over 350,000 publications, clinical trials, datasets, protocols, and other resources as of October 2022. We used a rigorous schema to enforce consistency across different sources and resource types and linked related resources. Researchers can quickly search the latest research across data repositories, regardless of resource type or repository location, via a search interface, public API, and R package. Finally, we discuss the challenges inherent in combining metadata from scattered and heterogeneous resources and provide recommendations to streamline this process to aid scientific research.

11.
Cell ; 184(19): 4939-4952.e15, 2021 09 16.
Article in English | MEDLINE | ID: mdl-34508652

ABSTRACT

The emergence of the COVID-19 epidemic in the United States (U.S.) went largely undetected due to inadequate testing. New Orleans experienced one of the earliest and fastest accelerating outbreaks, coinciding with Mardi Gras. To gain insight into the emergence of SARS-CoV-2 in the U.S. and how large-scale events accelerate transmission, we sequenced SARS-CoV-2 genomes during the first wave of the COVID-19 epidemic in Louisiana. We show that SARS-CoV-2 in Louisiana had limited diversity compared to other U.S. states and that one introduction of SARS-CoV-2 led to almost all of the early transmission in Louisiana. By analyzing mobility and genomic data, we show that SARS-CoV-2 was already present in New Orleans before Mardi Gras, and the festival dramatically accelerated transmission. Our study provides an understanding of how superspreading during large-scale events played a key role during the early outbreak in the U.S. and can greatly accelerate epidemics.


Subject(s)
COVID-19/epidemiology , Epidemics , SARS-CoV-2/physiology , COVID-19/transmission , Databases as Topic , Disease Outbreaks , Humans , Louisiana/epidemiology , Phylogeny , Risk Factors , SARS-CoV-2/classification , Texas , Travel , United States/epidemiology
12.
Cell ; 184(10): 2587-2594.e7, 2021 05 13.
Article in English | MEDLINE | ID: mdl-33861950

ABSTRACT

The highly transmissible B.1.1.7 variant of SARS-CoV-2, first identified in the United Kingdom, has gained a foothold across the world. Using S gene target failure (SGTF) and SARS-CoV-2 genomic sequencing, we investigated the prevalence and dynamics of this variant in the United States (US), tracking it back to its early emergence. We found that, while the fraction of B.1.1.7 varied by state, the variant increased at a logistic rate with a roughly weekly doubling rate and an increased transmission of 40%-50%. We revealed several independent introductions of B.1.1.7 into the US as early as late November 2020, with community transmission spreading it to most states within months. We show that the US is on a similar trajectory as other countries where B.1.1.7 became dominant, requiring immediate and decisive action to minimize COVID-19 morbidity and mortality.


Subject(s)
COVID-19 , Models, Biological , SARS-CoV-2 , COVID-19/genetics , COVID-19/mortality , COVID-19/transmission , Female , Humans , Male , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/pathogenicity , United States/epidemiology
13.
medRxiv ; 2021 Feb 07.
Article in English | MEDLINE | ID: mdl-33564780

ABSTRACT

As of January of 2021, the highly transmissible B.1.1.7 variant of SARS-CoV-2, which was first identified in the United Kingdom (U.K.), has gained a strong foothold across the world. Because of the sudden and rapid rise of B.1.1.7, we investigated the prevalence and growth dynamics of this variant in the United States (U.S.), tracking it back to its early emergence and onward local transmission. We found that the RT-qPCR testing anomaly of S gene target failure (SGTF), first observed in the U.K., was a reliable proxy for B.1.1.7 detection. We sequenced 212 B.1.1.7 SARS-CoV-2 genomes collected from testing facilities in the U.S. from December 2020 to January 2021. We found that while the fraction of B.1.1.7 among SGTF samples varied by state, detection of the variant increased at a logistic rate similar to those observed elsewhere, with a doubling rate of a little over a week and an increased transmission rate of 35-45%. By performing time-aware Bayesian phylodynamic analyses, we revealed several independent introductions of B.1.1.7 into the U.S. as early as late November 2020, with onward community transmission enabling the variant to spread to at least 30 states as of January 2021. Our study shows that the U.S. is on a similar trajectory as other countries where B.1.1.7 rapidly became the dominant SARS-CoV-2 variant, requiring immediate and decisive action to minimize COVID-19 morbidity and mortality.

14.
medRxiv ; 2021 Feb 08.
Article in English | MEDLINE | ID: mdl-33564781

ABSTRACT

The emergence of the early COVID-19 epidemic in the United States (U.S.) went largely undetected, due to a lack of adequate testing and mitigation efforts. The city of New Orleans, Louisiana experienced one of the earliest and fastest accelerating outbreaks, coinciding with the annual Mardi Gras festival, which went ahead without precautions. To gain insight into the emergence of SARS-CoV-2 in the U.S. and how large, crowded events may have accelerated early transmission, we sequenced SARS-CoV-2 genomes during the first wave of the COVID-19 epidemic in Louisiana. We show that SARS-CoV-2 in Louisiana initially had limited sequence diversity compared to other U.S. states, and that one successful introduction of SARS-CoV-2 led to almost all of the early SARS-CoV-2 transmission in Louisiana. By analyzing mobility and genomic data, we show that SARS-CoV-2 was already present in New Orleans before Mardi Gras and that the festival dramatically accelerated transmission, eventually leading to secondary localized COVID-19 epidemics throughout the Southern U.S.. Our study provides an understanding of how superspreading during large-scale events played a key role during the early outbreak in the U.S. and can greatly accelerate COVID-19 epidemics on a local and regional scale.

15.
Proc Natl Acad Sci U S A ; 115(42): 10750-10755, 2018 10 16.
Article in English | MEDLINE | ID: mdl-30282735

ABSTRACT

The chemical diversity and known safety profiles of drugs previously tested in humans make them a valuable set of compounds to explore potential therapeutic utility in indications outside those originally targeted, especially neglected tropical diseases. This practice of "drug repurposing" has become commonplace in academic and other nonprofit drug-discovery efforts, with the appeal that significantly less time and resources are required to advance a candidate into the clinic. Here, we report a comprehensive open-access, drug repositioning screening set of 12,000 compounds (termed ReFRAME; Repurposing, Focused Rescue, and Accelerated Medchem) that was assembled by combining three widely used commercial drug competitive intelligence databases (Clarivate Integrity, GVK Excelra GoStar, and Citeline Pharmaprojects), together with extensive patent mining of small molecules that have been dosed in humans. To date, 12,000 compounds (∼80% of compounds identified from data mining) have been purchased or synthesized and subsequently plated for screening. To exemplify its utility, this collection was screened against Cryptosporidium spp., a major cause of childhood diarrhea in the developing world, and two active compounds previously tested in humans for other therapeutic indications were identified. Both compounds, VB-201 and a structurally related analog of ASP-7962, were subsequently shown to be efficacious in animal models of Cryptosporidium infection at clinically relevant doses, based on available human doses. In addition, an open-access data portal (https://reframedb.org) has been developed to share ReFRAME screen hits to encourage additional follow-up and maximize the impact of the ReFRAME screening collection.


Subject(s)
Antiprotozoal Agents/pharmacology , Cryptosporidiosis/drug therapy , Cryptosporidium/drug effects , Databases, Pharmaceutical , Drug Discovery , Drug Repositioning/methods , Small Molecule Libraries/pharmacology , Animals , Cryptosporidiosis/parasitology , Drug Evaluation, Preclinical/methods , Female , High-Throughput Screening Assays , Humans , Mice , Mice, Inbred C57BL
16.
Elife ; 72018 05 29.
Article in English | MEDLINE | ID: mdl-29809149

ABSTRACT

Skeletal muscle comprises a family of diverse tissues with highly specialized functions. Many acquired diseases, including HIV and COPD, affect specific muscles while sparing others. Even monogenic muscular dystrophies selectively affect certain muscle groups. These observations suggest that factors intrinsic to muscle tissues influence their resistance to disease. Nevertheless, most studies have not addressed transcriptional diversity among skeletal muscles. Here we use RNAseq to profile mRNA expression in skeletal, smooth, and cardiac muscle tissues from mice and rats. Our data set, MuscleDB, reveals extensive transcriptional diversity, with greater than 50% of transcripts differentially expressed among skeletal muscle tissues. We detect mRNA expression of hundreds of putative myokines that may underlie the endocrine functions of skeletal muscle. We identify candidate genes that may drive tissue specialization, including Smarca4, Vegfa, and Myostatin. By demonstrating the intrinsic diversity of skeletal muscles, these data provide a resource for studying the mechanisms of tissue specialization.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation , Muscle Proteins/metabolism , Muscle, Skeletal/cytology , Muscle, Skeletal/metabolism , Animals , Cells, Cultured , Female , High-Throughput Nucleotide Sequencing , Male , Mice , Mice, Inbred C57BL , Muscle Proteins/genetics , Muscle, Smooth/cytology , Muscle, Smooth/metabolism , Myocardium/cytology , Myocardium/metabolism , Rats , Rats, Sprague-Dawley
17.
PLoS One ; 12(11): e0187457, 2017.
Article in English | MEDLINE | ID: mdl-29095940

ABSTRACT

RNA-sequencing (RNA-seq) and microarrays are methods for measuring gene expression across the entire transcriptome. Recent advances have made these techniques practical and affordable for essentially any laboratory with experience in molecular biology. A variety of computational methods have been developed to decrease the amount of bioinformatics expertise necessary to analyze these data. Nevertheless, many barriers persist which discourage new labs from using functional genomics approaches. Since high-quality gene expression studies have enduring value as resources to the entire research community, it is of particular importance that small labs have the capacity to share their analyzed datasets with the research community. Here we introduce ExpressionDB, an open source platform for visualizing RNA-seq and microarray data accommodating virtually any number of different samples. ExpressionDB is based on Shiny, a customizable web application which allows data sharing locally and online with customizable code written in R. ExpressionDB allows intuitive searches based on gene symbols, descriptions, or gene ontology terms, and it includes tools for dynamically filtering results based on expression level, fold change, and false-discovery rates. Built-in visualization tools include heatmaps, volcano plots, and principal component analysis, ensuring streamlined and consistent visualization to all users. All of the scripts for building an ExpressionDB with user-supplied data are freely available on GitHub, and the Creative Commons license allows fully open customization by end-users. We estimate that a demo database can be created in under one hour with minimal programming experience, and that a new database with user-supplied expression data can be completed and online in less than one day.


Subject(s)
Databases, Genetic , Gene Expression , Programming Languages , Sequence Analysis, RNA , Transcriptome
18.
PLoS One ; 9(2): e87649, 2014.
Article in English | MEDLINE | ID: mdl-24503716

ABSTRACT

Water-soluble organic fluorophores are widely used as labels in biological systems. However, in many cases these fluorophores can interact strongly with lipid bilayers, influencing the interaction of the target with the bilayer and/or leading to misleading fluorescent signals. Here, we quantify the interaction of 32 common water-soluble dyes with model lipid bilayers to serve as an additional criterion when selecting a dye label.


Subject(s)
Fluorescent Dyes/chemistry , Lipid Bilayers/chemistry , Fluorescent Dyes/metabolism , Lipid Bilayers/metabolism , Protein Binding , Proteins/chemistry , Proteins/metabolism , Solubility , Spectrometry, Fluorescence , Water
19.
Langmuir ; 29(39): 12220-7, 2013 Oct 01.
Article in English | MEDLINE | ID: mdl-23992147

ABSTRACT

Solid-supported lipid bilayers are useful model systems for mimicking cellular membranes; however, the interaction of the bilayer with the surface can disrupt the function of integral membrane proteins and impede topological transformations such as membrane fusion. As a result, a variety of tethered or cushioned lipid bilayer architectures have been described, which retain the proximity to the surface, enabling surface-sensitive techniques, but physically distance the bilayer from the surface. We have recently developed a method for spatially separating a lipid bilayer from a solid support using DNA lipids. In this system, a DNA strand is covalently attached to a glass slide or SiO2 wafer, and giant unilamellar vesicles (GUVs) displaying the complement rupture to form a planar lipid bilayer tethered above the surface. However, the location of the patch is random, determined by where the DNA-GUV initially binds to its complement. To allow greater versatility and control, we sought a way to pattern tethered membrane patches. We present a method for creating spatially distinct tethered membrane patches on a glass slide using microarray printing. Surface-reactive DNA sequences are spotted onto the slide, incubated to covalently link the DNA to the surface, and DNA-GUVs patches are formed selectively on the printed DNA. By interfacing the bilayers with microfluidic flow cells, materials can be added on top of or fused into the membrane to change the composition of the bilayers. With further development, this approach would enable rapid screening of different patches in protein binding assays and would enable interfacing patches with electrical detectors.


Subject(s)
DNA/chemistry , Lipid Bilayers/isolation & purification , Lipids/chemistry , Lipid Bilayers/chemistry , Molecular Structure , Particle Size , Surface Properties
20.
J Chem Inf Model ; 48(1): 220-32, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18186622

ABSTRACT

This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (Tm), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE = 0.73, r2 = 0.87). Melting point prediction, on the other hand, was the most difficult (RMSE = 52.8 degrees C, r2 = 0.46), and Log S statistics were intermediate between melting point and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor melting point prediction, and we suggest that deficiencies in descriptors used in melting point prediction contribute significantly to the prediction errors.


Subject(s)
Models, Chemical , Quantitative Structure-Activity Relationship , Transition Temperature , Artificial Intelligence , Octanols/chemistry , Solubility , Water/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...